Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fine-tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: Lu Decomposition of Small Matrices

Common techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel Xeon Phi coprocessors are discussed. These techniques include strength reduction, regularizing the vectorization pattern, data alignment and aligned data hint, and pointer disambiguation. In addition, the loop tiling technique of memory traffic tuning is shown. The optimization methods ...

متن کامل

Lattice QCD on Intel R © Xeon Phi TM coprocessors

Lattice QuantumChromodynamics (LQCD) is currently the only known model independent, non perturbative computational method for calculations in the theory of the strong interactions, and is of importance in studies of nuclear and high energy physics. LQCD codes use large fractions of supercomputing cycles worldwide and are often amongst the first to be ported to new high performance computing arc...

متن کامل

Effective Barrier Synchronization on Intel Xeon Phi Coprocessor

Barriers are a fundamental synchronization primitive, underpinning the parallel execution models of many modern shared-memory parallel programming languages such as OpenMP, OpenCL or Cilk, and are one of the main challenges to scaling. State-of-the-art barrier synchronization algorithms differ in tradeoffs between critical path length, communication traffic patterns and memory footprint. In thi...

متن کامل

Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors

Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale systems using accelerators has motivat...

متن کامل

An Empirical Study of Intel Xeon Phi

With at least 50 cores, Intel Xeon Phi is a true manycore architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility it can be used both as a coprocessor or as a stand-alone processor are very tempting for parallel applications looking for new...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Scientific Programming

سال: 2015

ISSN: 1058-9244,1875-919X

DOI: 10.1155/2015/269764